Protocol Buffers (Protobuf) is a fast, efficient data serialization format perfect for Python applications. Whether you're building gRPC services, microservices, or need efficient data storage, Protobuf with Python is a powerful combination.
This tutorial covers everything: installation, creating schemas, serialization, deserialization, and real-world examples. If you're new to Protobuf, start with our What is Protobuf guide.
What You'll Learn: How to install Protobuf, create .proto files, generate Python code, serialize/deserialize data, and use Protobuf in real Python applications.
Installation
Step 1: Install Protocol Buffer Compiler
# macOS (using Homebrew) brew install protobuf # Ubuntu/Debian sudo apt-get install protobuf-compiler # Windows (using Chocolatey) choco install protoc # Or download from GitHub: # https://github.com/protocolbuffers/protobuf/releases # Verify installation protoc --version # Output: libprotoc 3.21.12 (or similar)
Step 2: Install Python Protobuf Library
# Install protobuf library pip install protobuf # Verify installation python -c "import google.protobuf; print(google.protobuf.__version__)" # Output: 4.25.1 (or similar)
Creating Your First .proto File
A .proto file defines your data structure. Let's create a simple user schema:
user.proto
syntax = "proto3";
// Define the User message
message User {
int32 id = 1;
string name = 2;
string email = 3;
bool is_active = 4;
repeated string hobbies = 5;
}Key points: Each field has a type, name, and unique number (called a "tag"). The repeated keyword makes hobbies a list.
Compiling to Python Code
Use the protoc compiler to generate Python code from your schema:
Compile Command
# Compile user.proto to Python protoc --python_out=. user.proto # This generates: user_pb2.py
The --python_out=. flag tells protoc to generate Python code in the current directory. The generated file is named user_pb2.py.
Serialization: Python Object → Binary
Now let's use the generated code to create and serialize a User object:
Creating and Serializing
import user_pb2 # Create a User object user = user_pb2.User() user.id = 1 user.name = "Alice" user.email = "[email protected]" user.is_active = True user.hobbies.append("reading") user.hobbies.append("coding") # Serialize to binary format binary_data = user.SerializeToString() print(f"Serialized size: {len(binary_data)} bytes") print(f"Binary data: {binary_data}") # Output: # Serialized size: 45 bytes # Binary data: b'\x08\x01\x12\x05Alice\x1a\[email protected] ...'
The SerializeToString() method converts the User object to compact binary format.
Deserialization: Binary → Python Object
Read binary data back into a Python object:
Deserializing Data
import user_pb2
# Assume we have binary_data from serialization
# binary_data = user.SerializeToString()
# Create a new User object
new_user = user_pb2.User()
# Parse binary data
new_user.ParseFromString(binary_data)
# Access the fields
print(f"ID: {new_user.id}")
print(f"Name: {new_user.name}")
print(f"Email: {new_user.email}")
print(f"Active: {new_user.is_active}")
print(f"Hobbies: {list(new_user.hobbies)}")
# Output:
# ID: 1
# Name: Alice
# Email: [email protected]
# Active: True
# Hobbies: ['reading', 'coding']The ParseFromString() method reads binary data and populates the object fields.
Reading and Writing Protobuf Files
Writing to File
import user_pb2 # Create user user = user_pb2.User() user.id = 1 user.name = "Bob" user.email = "[email protected]" user.is_active = True # Write to file with open('user.pb', 'wb') as f: f.write(user.SerializeToString()) print("User saved to user.pb")
Reading from File
import user_pb2
# Read from file
user = user_pb2.User()
with open('user.pb', 'rb') as f:
user.ParseFromString(f.read())
print(f"Loaded user: {user.name}")Working with Nested Messages
Protobuf supports nested message types for complex data structures:
company.proto
syntax = "proto3";
message Company {
string name = 1;
message Address {
string street = 1;
string city = 2;
string country = 3;
}
Address headquarters = 2;
repeated Employee employees = 3;
}
message Employee {
int32 id = 1;
string name = 2;
string department = 3;
}Using Nested Messages in Python
import company_pb2
# Create company
company = company_pb2.Company()
company.name = "TechCorp"
# Set nested Address
company.headquarters.street = "123 Tech St"
company.headquarters.city = "San Francisco"
company.headquarters.country = "USA"
# Add employees (repeated field)
emp1 = company.employees.add()
emp1.id = 1
emp1.name = "Alice"
emp1.department = "Engineering"
emp2 = company.employees.add()
emp2.id = 2
emp2.name = "Bob"
emp2.department = "Sales"
# Serialize
data = company.SerializeToString()
print(f"Company data: {len(data)} bytes")Converting Between Protobuf and JSON
Python Protobuf library provides utilities to convert between Protobuf and JSON:
Protobuf to JSON
from google.protobuf.json_format import MessageToJson import user_pb2 # Create user user = user_pb2.User() user.id = 1 user.name = "Alice" user.email = "[email protected]" # Convert to JSON json_string = MessageToJson(user) print(json_string) # Output: # { # "id": 1, # "name": "Alice", # "email": "[email protected]", # "isActive": false, # "hobbies": [] # }
Perfect for debugging or when you need human-readable output.
JSON to Protobuf
from google.protobuf.json_format import Parse
import user_pb2
json_string = '''
{
"id": 2,
"name": "Bob",
"email": "[email protected]",
"isActive": true
}
'''
# Parse JSON into Protobuf object
user = Parse(json_string, user_pb2.User())
print(f"Name: {user.name}")
print(f"Active: {user.is_active}")
# Output:
# Name: Bob
# Active: TrueUseful for accepting JSON input and converting to Protobuf.
Real-World Example: User Service
Let's build a complete example that saves and loads user data:
user_service.py
import user_pb2
import os
class UserService:
def __init__(self, storage_path='users.pb'):
self.storage_path = storage_path
def save_user(self, user_id, name, email, is_active=True, hobbies=None):
"""Save a user to file"""
user = user_pb2.User()
user.id = user_id
user.name = name
user.email = email
user.is_active = is_active
if hobbies:
user.hobbies.extend(hobbies)
# Write to file
with open(self.storage_path, 'wb') as f:
f.write(user.SerializeToString())
print(f"User {name} saved successfully!")
return user
def load_user(self):
"""Load user from file"""
if not os.path.exists(self.storage_path):
print("No user file found")
return None
user = user_pb2.User()
with open(self.storage_path, 'rb') as f:
user.ParseFromString(f.read())
return user
def display_user(self, user):
"""Display user information"""
print(f"\nUser Information:")
print(f" ID: {user.id}")
print(f" Name: {user.name}")
print(f" Email: {user.email}")
print(f" Active: {user.is_active}")
print(f" Hobbies: {', '.join(user.hobbies)}")
# Usage
if __name__ == "__main__":
service = UserService()
# Save a user
user = service.save_user(
user_id=1,
name="Alice",
email="[email protected]",
is_active=True,
hobbies=["reading", "coding", "hiking"]
)
# Load and display
loaded_user = service.load_user()
if loaded_user:
service.display_user(loaded_user)Best Practices
'wb' and 'rb' modesHelpful Resources
Learn More
Summary
Protocol Buffers with Python provides a powerful, efficient way to serialize and deserialize data. Whether you're building microservices, mobile backends, or data pipelines, Protobuf offers significant performance benefits.
- •Install
protocandprotobufPython library - •Define schemas in
.protofiles - •Generate Python code with
protoc - •Use
SerializeToString()andParseFromString() - •Convert to/from JSON when needed for debugging
Next Steps: Learn how Protobuf compares to JSON or explore gRPC with Python for building high-performance APIs.