Protocol Buffers is powerful, but like any tool, there are right ways and wrong ways to use it. Follow these best practices to avoid common pitfalls, ensure backward compatibility, and build schemas that stand the test of time.
These aren't just theoretical rules. They're battle-tested practices from teams running protobuf in production at scale. We'll use simple examples so you can apply them immediately.
Schema Design Best Practices
✓ DO: Keep Messages Small and Focused
Each message should represent one thing. Don't create giant "god messages" with everything.
❌ BAD:
// Too much in one message message UserData { string name = 1; string email = 2; string address = 3; string phone = 4; repeated Order orders = 5; repeated Payment payments = 6; AccountSettings settings = 7; // ... 20 more fields }
✓ GOOD:
// Split into focused messages message User { string user_id = 1; string name = 2; string email = 3; } message UserContact { string user_id = 1; string phone = 2; string address = 3; } message UserOrders { string user_id = 1; repeated Order orders = 2; }
✓ DO: Use Clear, Descriptive Names
Names should tell you what the field contains. Avoid abbreviations unless they're industry standard.
❌ BAD:
message Sub { string msdn = 1; // Typo string nm = 2; // Too short int32 st = 3; // Unclear bool act = 4; // Ambiguous }
✓ GOOD:
message Subscriber { string msisdn = 1; // Standard telecom string name = 2; // Clear SubscriptionType type = 3; // Descriptive bool is_active = 4; // Unambiguous }
✗ NEVER: Change Field Numbers
Once a field number is assigned, it's permanent. Changing it breaks compatibility completely.
// Version 1 message Subscriber { string name = 1; string email = 2; // ← This field number is FOREVER } // Version 2 - WRONG! ❌ message Subscriber { string name = 1; string email = 3; // ← Changed from 2 to 3 = DISASTER! } // Version 2 - CORRECT! ✓ message Subscriber { string name = 1; string email = 2; // Keep original number string phone = 3; // New field gets new number }
Field Numbering Best Practices
✓ DO: Reserve Numbers 1-15 for Frequently Used Fields
Field numbers 1-15 take 1 byte to encode. Numbers 16+ take 2 bytes. Put your most common fields first.
message CallRecord { // Hot path fields (1-15) - used in every message string caller_msisdn = 1; // Always present string callee_msisdn = 2; // Always present int64 duration_seconds = 3; // Always present int64 timestamp = 4; // Always present // Less frequent fields (16+) string call_id = 16; // Only for debugging repeated string notes = 17; // Rarely used string operator_id = 18; // Optional metadata }
✓ DO: Reserve Field Numbers for Future Use
When you remove a field, reserve its number so it's never reused accidentally.
message Subscriber { // Reserved numbers from deleted fields reserved 2, 5, 9 to 11; reserved "old_field_name", "deprecated_status"; // Active fields string msisdn = 1; string name = 3; bool is_active = 4; // Never use 2, 5, 9, 10, or 11 again! }
✓ DO: Leave Gaps for Future Fields
Don't number fields 1, 2, 3, 4... Leave space so you can add related fields later.
message Subscriber { // Identity fields (1-10) string msisdn = 1; string email = 2; string name = 3; // Reserved 4-10 for future identity fields // Status fields (11-20) bool is_active = 11; SubscriptionType type = 12; // Reserved 13-20 for future status fields // Metadata fields (21-30) int64 created_at = 21; int64 updated_at = 22; // Reserved 23-30 for future metadata }
Choosing the Right Data Types
✓ DO: Use Fixed-Width Types for Performance-Critical Fields
For fields that are always large numbers, fixed-width types are faster and smaller.
message TelemetryData { // Use fixed64 for timestamps (always 8 bytes, faster) fixed64 timestamp_nanos = 1; // Use fixed32 for hashes (always 4 bytes) fixed32 checksum = 2; // Use regular int64 for IDs (variable length, smaller for small numbers) int64 user_id = 3; // Use sint32 for negative numbers (better encoding) sint32 temperature_celsius = 4; }
✓ DO: Use Enums Instead of Strings for Fixed Values
Enums are more efficient and type-safe than strings. Plus, you catch typos at compile time.
❌ BAD:
message Subscriber { string status = 1; // "active", "inactive", // "suspended", "activee" (typo!) }
✓ GOOD:
enum SubscriberStatus { ACTIVE = 0; INACTIVE = 1; SUSPENDED = 2; } message Subscriber { SubscriberStatus status = 1; // Type-safe! }
✓ DO: Use Appropriate String Types
Use string
for UTF-8 text,bytes
for binary data.
message DataPacket { // Use string for human-readable text string customer_name = 1; // UTF-8 text string description = 2; // UTF-8 text // Use bytes for binary data bytes encrypted_payload = 3; // Binary data bytes file_contents = 4; // Binary data bytes checksum = 5; // Binary hash }
Versioning and Backward Compatibility
✓ DO: Always Add New Fields, Never Remove
Adding fields is safe. Removing fields breaks old code. Mark deprecated instead.
message Subscriber { string msisdn = 1; string name = 2; // Old field we don't want anymore - DON'T DELETE! string old_status = 3 [deprecated = true]; // Mark deprecated // New replacement field SubscriberStatus status_v2 = 4; // Use this instead // New field added in v2 string email = 5; // Safe to add anytime }
✓ DO: Make All Fields Optional (proto3 default)
In proto3, all fields are optional by default. This makes evolution easier.
// proto3 - all fields optional by default message Subscriber { string msisdn = 1; // Can be missing string name = 2; // Can be missing bool is_active = 3; // Can be missing (defaults to false) // If you really need to know if a field was set: optional string email = 4; // Explicitly optional (has presence) }
✓ DO: Use Default Values Wisely
Remember that proto3 doesn't send default values over the wire. Design with this in mind.
message FeatureFlags { // Good: true is the safe default bool enable_new_feature = 1; // Default false = feature off // Careful: false might be meaningful bool is_verified = 2; // false could mean "not verified" or "field not set" // Better: Use enum when false is meaningful enum VerificationStatus { VERIFICATION_UNKNOWN = 0; // Explicit unknown state VERIFIED = 1; NOT_VERIFIED = 2; } VerificationStatus verification = 3; }
Performance Best Practices
✓ DO: Put Frequently Accessed Fields First
Parsers can skip fields. Put hot data in low field numbers for faster access.
message CallRecord { // Hot path - checked on every message string caller_msisdn = 1; // Read immediately string callee_msisdn = 2; // Read immediately int64 duration = 3; // Read immediately // Cold path - rarely accessed repeated string debug_logs = 20; // Usually skipped map<string, string> metadata = 21; // Usually skipped }
✓ DO: Use Repeated Instead of Maps When Possible
Maps are convenient but have overhead. For simple lists, use repeated fields.
⚠️ SLOWER:
message Subscriber { // Map has overhead map<string, string> tags = 1; }
✓ FASTER:
message Tag { string key = 1; string value = 2; } message Subscriber { // Repeated is faster repeated Tag tags = 1; }
✓ DO: Reuse Message Objects
Creating new messages allocates memory. Reuse when processing many messages.
⚠️ SLOWER:
// Python - creates new object each time for data in stream: msg = Subscriber() # New alloc msg.ParseFromString(data) process(msg)
✓ FASTER:
// Python - reuse object msg = Subscriber() # One alloc for data in stream: msg.Clear() # Reuse memory msg.ParseFromString(data) process(msg)
Organization and Maintainability
✓ DO: Use Packages to Organize Schemas
Packages prevent naming conflicts and organize related messages.
// Good organization syntax = "proto3"; package telecom.subscriber.v1; // Clear namespace message Subscriber { string msisdn = 1; } message SubscriberList { repeated Subscriber subscribers = 1; }
✓ DO: Add Comments Generously
Future you (and your team) will thank you. Explain non-obvious fields.
message CallRecord { // Mobile Subscriber ISDN (international format, e.g., +91-9876543210) // Must include country code for international calls string caller_msisdn = 1; // Call duration in seconds. Rounded down to nearest second. // Billing is based on this value, so ensure accuracy. int64 duration_seconds = 2; // Unix timestamp in milliseconds when call started. // Uses server time (UTC) not client time. int64 start_timestamp_ms = 3; }
✓ DO: Version Your Package Names
Include version in package name for major changes. Allows gradual migration.
// subscriber_v1.proto package telecom.subscriber.v1; message Subscriber { string msisdn = 1; string name = 2; } // subscriber_v2.proto (major breaking changes) package telecom.subscriber.v2; message Subscriber { string msisdn = 1; PersonName name = 2; // Changed from string to message // Can coexist with v1 during migration }
Common Mistakes to Avoid
❌ Mistake #1: Using required Fields (proto2)
Problem: Required fields can't be removed without breaking everything.
Solution: Use proto3 (all fields optional) or make everything optional in proto2.
❌ Mistake #2: Reusing Field Numbers
Problem: Old messages will parse into wrong fields.
Solution: Use reserved
for deleted field numbers.
❌ Mistake #3: Storing Large Binary Data Directly
Problem: Entire message must be loaded to memory.
Solution: Store large files elsewhere, keep references in protobuf.
❌ Mistake #4: Not Planning for Extension
Problem: Schema becomes unmaintainable as it grows.
Solution: Leave gaps in field numbers, use reserved blocks.
Quick Reference Checklist
✓ DO
- ✓Use clear, descriptive field names
- ✓Reserve 1-15 for frequent fields
- ✓Add comments to your schema
- ✓Use enums for fixed values
- ✓Reserve deleted field numbers
- ✓Version your package names
- ✓Keep messages small and focused
✗ DON'T
- ✗Change field numbers ever
- ✗Reuse field numbers
- ✗Use required fields (proto2)
- ✗Create "god messages"
- ✗Store huge binary blobs directly
- ✗Use abbreviations in names
- ✗Delete fields (mark deprecated)
Related Resources
External References
Official Documentation
- Proto3 Language Guide - Official syntax reference
- API Best Practices - Google's official recommendations
- Encoding Guide - Understanding wire format
Conclusion
Following these best practices will save you from painful debugging sessions and breaking changes. Protocol Buffers is incredibly powerful when used correctly, but small mistakes can cause big problems down the line.
Start with good schema design, never change field numbers, and always think about backward compatibility. Your future self and your team will thank you. Remember: it's easier to design it right the first time than to fix it later when you have millions of messages in production.